About the posterior distribution in hidden Markov Models with 

unknown number of states 

Elisabeth Gassiat, 

Laboratoire de Mathematique, Universite Paris-Sud and CNRS, Orsay, France, 

Judith Rousseau, 

ENSAE-CREST and CEREMADE, Universite Paris-Dauphine, Paris, France. 

July 10, 2012 

Abstract 

In this paper, we investigate the asymptotic behaviour of the posterior distribution in 
hidden Markov models (HMMs) when using Bayesian methodology. We obtain a general 
asymptotic result, and give conditions on the prior under which we obtain a rate of conver- 
gence for the posterior distribution of the marginal distributions of the process. We then 
focus on the situation where the hidden Markov chain evolves on a finite state space but 
where the number of hidden states might be larger than the true one. It is known that the 
likelihood ratio test statistic for overfitted HMMs has a non standard behavior and is un- 
bounded. Our conditions on the prior may be seen as a way to penalize parameters to avoid 
this phenomenon. We are then able to define a consistent Bayesian estimator of the number 
of hidden states. We also give a precise description of the situation when the observations 
are i.i.d. and we allow 2 possible hidden states. 

Keywords: Hidden Markov models, number of components, order selection, Bayesian 
statistics, posterior distribution. 

Short title: Asymptotics of the posterior for HMM. 



1 Introduction 



Hidden Markov models are stochastic processes {Xj,Yj)j>o where {Xj)j>o is a Markov 
chain living in a state space X and conditionnally on (Xj)j>o the Yj 's are indepen- 
dent with a distribution depending only on Xj and living in y. The observations are 
Yi;n = (^1, ■ ■ ■ , Yn) and the associated states Ai:„ = (Ai, • • • , A„) are unobserved. Hidden 
Markov models are useful tools to model time series where the observed phenomenon is 
driven by a latent Markov chain. They may be seen as a dynamic extension of mixture 
model s. They have been, used successfully in a variety of a pplica tions such as economics 
(e.g. ([Albert and ChihI. \l99?i)). genomics (e.g. (IChurchilll . ll989^ V signal processing and 

119951)1. speech 



recog ni tion (e.g. (iRabinei 1989 )) to nam e but a few. The books ( MacDonald and Zucchini 
199^, ( MacDonald and Zucchini . 20091) and ( Capoe et all |200^ provide several examples 
of applications of HMMs and give a recent (for the latter) state of the art in the statistical 
analysis of HMMs. When the state space X of the hidden Markov chain is finite, the num- 
ber of hidden states induces a classification of the regimes in which the time series evolves. 
They often have a practical interpretation in the modelling of the underlying phenomenon. 
It is thus of importance to be able to infer both the number of hidden states (which we 
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call the order of the HMM) from data, when it is not known in advance and the associated 
parameters. 

In the frequentist literature, penalized likelihood methods have been proposed to estimate 
the order of a HMM, using for instance Bayesian informa tion criteria ( BIG for short) . Thes e 
methods were applied for instance in ( Leroux and Putterman , 19921 ). ()Rvden et al. . 1998[) , 
but without theoretical consistency results. Later, it has been observed that the likelihood 
ratio statistics is unbouded, in the very simpl e situation where one wants to test between 
1 or 2 hidden states, see ( Gassiat and Keribini [2000,) . The question whether BIG penalized 
likelihood methods lead to consistent order estimation stayed open. Using tools borrowed 
from information theory, it has been possible to calibrate heavie r penalties in maximum 
likeli ho od methods to obtain c onsistent estimators of the order, see ( Gassiat and BoucheronL 
200i), (|Ghambaz et all . l2009l) . The use of penali zed marginal p seudo likelihood was also 



proved to lead to weakly consistent estimators by (iGassiatl . 12002 

On the Bayesian side, various methods were proposed to deal with an unknown number 
of hidden states, but no theoretical result exists to validate the methods. Reversible jump 
methods have been built, l ea ding to sa tisfactory results on simulation and real d ata, see 
dBoys and HendersonL liool . (jGreen and Richardson, 20o3l . dRobert et al.L l2000h. (ISpezial. 



201C ). The ideas of variational Bayesian methods were developed in (jMcGrorv and Titterington . 



2009t ). Recently, one of the auth ors proposed a th eoretical analysis of the posterior distribu- 



tion for overfitted mixtures, see ([Rousseau and M engersen. 20111). In this paper, it is proved 
that one may choose the prior in such a way that extra components are emptied, or in such 
a way that extra components merge with true ones. More precisely, if a Dirichlet prior 
T>{ai, . . . , a/c) is considered on the k weights of the mixture components, small values of the 
Qfj's imply that the posterior distribution will tend to empty the extra components of the 
mixture when the true distribution has a smaller number, say kg < k of true components. 

One aim of our paper is to understand if such an analysis may be extended to dynamic 
mixtures, that is to HMMs. Since HMMs are much more complicated models regarding order 
estimation, with unbounded likelihood ratio statistics that are still not well understood, our 
results do not cover all choices of prior distributions to empty extra components or to merge 
them with true ones. Only this last possibility is fully understood. Gonsider a finite state 
space HMM, with k states and with independent Dirichlet prior distributions 'D{ai, . . . , Uk) 
on each row of the transition matrix of the latent Markov chain. We prove that if the 
parameters a^'s are large enough, extra components merge to true ones. We are also able 
to propose a Bayesian consistent estimator of the number of hidden states, without using 
variable dimension algorithms such as reversible jump algorithms, which are often difficult 
to implement efficiently. We are thus able to give guidelines to choose the prior in such a 
way that the posterior leads to interpretable results by chosing large enough parameters in 
the Dirichlet prior. 

In Section [2l we give a general theorem on the asymptotic behaviour of the posterior 
distribution. To our knowled ge, this is the first general t heoretical result for HMM Bayesian 
estimation. Though (Ghosal and van der Vaartl l2006l) give rates of convergence for the 
posterior in possibly dependent observation models, they cannot be applied to the order 
estimation problem, as explained in Section 12.21 In Section [3] we consider the case of finite 
state space HMMs. Using the general result of Section [51 we explain how it is possible 
to choose the prior in such a way that the posterior gives consistent estimation of the 
marginal distributions. In this case we also obtain convergence rates. We are then able 
to derive a consistent Bayesian estimator of the number of hidden states, which does not 
require a prior on the number of states nor the computation of marginal likelihoods in the 
different candidate models. To our knowledge, this is the first consistency result on Bayesian 
order estimation in the case of hidden Markov models. In the specific situation where the 
overfitting is by only one state and the observations are i.i.d., we are able to describe more 
precisely what choice of the prior leads to the merging of the two states together with 
convergence rates. Proofs are given in Sectional 
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2 Posterior concentration rates for HMMs : a general 
result 



Since we could not find in the literature any result on the asymptotic concentration of the 
posterior distribution in HMM models we first present a general theorem where the posterior 
concentration is proved in such models. We first describe the general setting and we give 
some notations that are used throughout the paper. 



2.1 Setting and notations 

Recall that HMMs model pairs {Xi, Yi), i = 1, . . . ,n, where {Xi)i is the unobserved Markov 
chain living on a state space X and the observations {Yi)2^-^ are conditionally independent 
given the {Xi)^^-^ and live in 3^. The spaces X,y can be general and we only assume that 
they are Polish spaces endowed with their Borel a-fields. The hidden Markov chain {Xi)^^-^ 
has a Markov kernel Qg,6 G Q where G is a subset of an Euclidian space and the conditional 
distribution of Yi given Xi has density with respect to some given measure i/ ony denoted by 
g0{y\x), X £ X, 9 & Q. With an abuse of notations we also denote ly the product measure i^^^ 
on . We assume that the Markov kernels Qe admit a (not necessarily unique) stationary 
distribution jig, for each 9 E Q. We write Fg for the probability distribution of the stationary 
HMM {Xj,Yj)j>i with parameter 9. That is, for any integer n, any measurable set A in 
the Borel cr-field of Af" x 3^": 

„ n—1 n 

]Pg{{Xi,...,Xn,Yi,...,Yn) € A) = jig^dxi) 'WQg {xi,dxi+-i)'^gg {yi\xi) v{dy-i) . . .v{dyn). 

i=i i=i 

(1) 

Thus for any integer n, under ¥g, Yi-n = (Fi, . . . , Yn) has a probability density with respect 
to v{dyi) ■ ■ ■ u{dyn) equal to 



„ n—1 n 

fn,e{yi,---,yn) = / t^e{dxi)Y[Qe{xi,dxi+i)Y[9eiyi\xi) . 



(2) 



We denote tt the prior distribution on 0. As is often the case in Bayesian analysis of HMMs, 
instead of computing the stationary distribution \xg of the hidden Markov chain X for each 
9, wc consider a probability distribution -kx on the unobserved initial state Xq. Denote 
£n{(^,x) the log-likelihood starting from x, for all x & X, which is given by 



n-1 



X 



Qg {x,dxi) Qg {xi,dxi+i)Y[ge iYi\xi) 



i=l 



i=l 



Similarly, the log- likelihood starting from a distribution ttq on X is denoted ttq) i.e. 



(6',7ro) = log 



Jx 



The posterior distribution can then be written as 

j^^^^iM^(^d9)nx (dx) 



J^^^e'^«^'-)n{d9) TTx (dx) 



(3) 



for any Borel set A C Q. 

We shall also use the notation Fg^^ for the probability distribution of the HMM starting 
from X, that is, for any integer n, any measurable set A in the Borel a-field of X" x 3^": 

^ n—1 n 

^e,x {{Xi, . . . , Xn,Yi, . . . ,Yn) € A) = Qg{x,dxi) Qg{xi,dxi+i)'^gg{yi\xi)v{dyi) ...v{dyn), 

i=i i=i 
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so that for any 9 € Q, 



X 



We denote by Eq the expectation under Pg and by Eq^^ the expectation under ^g^x- 

We assume throughout the paper that we are given a stationary HMM {^j-,^j)2>\ with 
distribution Pgp for some G ©• We will be interested in the asymptotic behaviour of the 
posterior distribution of finite marginals of the process. Indeed, marginals (of dimension at 
least 2) capture the transition of the Markov chain together with the emission parameters 
as we shall explain below. Thus we define for any integer I > 2, and for any 6* G 8, 
the probability density //,6»(---) of (Yi,...,yj) under P^i. For any parameter 9, fi^ is a 
mixture in y'' of product probability measures, see equation (0). When such mixtures are 
identifiable, knowledge of fifi leads to the knowledge of the mixing measure, which itself 
gives the knowledge of the distribution of the hidden Markov chain. Mixtures of products of 
Gaussian distrib utions are identifiable, for in st ance, but many other fa milies of mixtures ar e 
identifiable, see (|MacLachlan and Peelll2000l ). (jHall and Zhoul . l2003f) . (jAUman et al.l . l2009l) . 



For any € O, since the total variation norm between probability measures is bounded by 
2, it is possible to define real numbers > 1 and Re > such that, for any integer m, any 

X £ X 

\\Q7{x,.)^^Jie\\TV<R9p-e"' (4) 

where || • \\tv is the total variation norm. If it is possible to set > 1, the Markov chain 
(,Xn)n>i is uniformly ergodic and jjig is its unique stationary distribution. 

Throughout the paper V^/i denotes the gradient vector of the function h when considered 
as a function of 9, and D^h its Hessian matrix. We denote by Bd{'y, e) the d dimensional 
ball centered at 7 with radius e, when 7 G M.'^. The notation a„ > 6„ means that a„ is larger 
than bn up to a positive constant that does not depend on n. 

2.2 General HMMs 

We now derive posterior concer itration rates in the framework o f Hidden Markov models. 
This setup follows the ideas of (|Ghosal and van der Vaartl 120061 ) . The proof of Theorem [T] 
is given in Section 01 

Theorem 1 Assume 
. (AO) pe, > 1. 

• (Al). There exists S",! C 8 x A", Z) > 0, A > and xq in X such that for any integer 
n, Peo-a.s., 

en{eo)-en{Oo,xo)<A, 

and for any sequence (C„)„>i of real numbers tending to +00 

sup ¥e^[en{9,x)~en{0o,xo)<-Cn]^o{l), 7r0^;t[5„] >n-^/2. 

• (A2). There exists a sequence {J^n)n>i of subsets of Q such that 

^{K) = o(n-^/2). 

• (A3). There exists 5q > Q and M > such that for all 5^ > 5 > Q, 

Ni6,Tn,dii.,.)) < 

where N{d,J-n,di{., .)) is the smallest number of 9j G such that for all 9 G J-'„ there 
exists a9.j withdi{9j,9) < 6. Heredi{9,9j) = \\fi,e- fi,e,\\i Jyi \fi,e- fi,e,\{y)di'{y)- 



. M 
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Then there exists K large enough such that 



\\fl,e - flfiaW 



2Re 



- 1 



> K 



logn 



If for any positive integer m, any positive e, one defines: 



me < Wflfi - flfioW 



1 



2Rg + pg-l 



OPeo (1) • 



< (m + l)e 



and if moreover 

• (A4-). For any sequence e„ tending to such that for any n, ne^ > c > 0, there 
exist Ci, C2 and uiq such that for any m < niologn, TT{An^m (^n)) < C'i(me„)^ and 
A„^„(e„),d, (.,.)) <Cim'^^ 

then, for any sequence Mn tending to infinity, 



■ Wfie - fieoW 



- 1 



2Re 



-1 - .atI"^^^" 



= OPo„ (1) ■ 



Theorem [T] gives the posterior concentration rate of 



PB-l 



up to the paramete r 



„D , . This is in sharp contrast with the results in (jChosal and van" der Vaartl.r2006[) , 
though the proof of our theorem follows the same lines 
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In (jGhosal and van der Vaartl . 
applications to Markov chains or to Gaussian time series of their general theorem 
use assumptions that lead in some sense to lower bound the coefficient 2Rg'+pl ■ This 
corresponds to choosing a prior whose support in @ is included in a set where 



2Re+pg-l 

uniformly bounded from below. If such a prior is considered, then Theorem [1] implies that 
the posterior distribution concentrates on {0; fi^g — fi^g^} and also provides a concentration 
rate of the posterior distribution of order 0((log n/n)^/^) or 0{l/^/n), in terms of the Li 
norm on fi,e — fi,eo- Even in the simple case of finite state space HMMs, which are extensively 
used in practice, this type of priors would be awkward. We investigate this case in details 
in Section [21 

In the case of over-fitted HMMs with finite state space, i.e. when corresponds to a 
HMM associated with ko states while the model considers HMMs associated with k > ko 
states the parameter set has to contain all possible transition matrices and in any neigh- 
bourhood {6 : Wfi^e — //,6»olli ^ there exist parameters 9 such that p{6) = 1. Thus, one 
has to allow pe to be arbitrarily close to 1. We will see that a good choice of the prior, 
however, acting as soft thresholding, leads to the concentration of the posterior distribution 
around fog, in terms of \\fi,e — /;,eol|i alone, at a rate slower than (logn/n)^/^. In case 
k ^ ko, that is when the number of hidden states is known, with a good choice of prior, the 
posterior concentrates around at rate 1/y/n. In fact, the understanding of the geometry 
of the neighbourhoods {9 : \\fi^g — fi,0o\\i < e} is needed to be able to verify whether (A4) 
holds. It is also needed to understand whether states merge or not, and to be able to build 
an order estimator. Such an understanding is provided in Section [3.21 

Assumption (^0) implies that at the hidden Markov chain X is uniformly ergodic. 
Assumptions (^2) — {A3) are similar in spirit to those considered in general theorems on pos- 
terior cons istency or posterior convergence r ates, see for instance ([Ghosh and Ramamoorthi . 
l2003h and ([Ghosal and van der Vaartl.l2006l). Con dition (Al) is close to the Kullback-Leibler 
condition as in ( Ghosal and van der VaartLliooi) . adapted to a parametric context. A non 
parametric formulation could also have been provided, replacing C„ with ne^ in (Al), rt"^/^ 
by e~"^" in (^1) and (^2) and M by ne^/logn in {A3). However, mostly parametric es- 
timation in HMMs has been studied, but for very particular models such as deconvolution 
models, since identifiability is a key issue for non parametric HMMs. 

In the following section, we explain how condition (Al) can be verified under conditions 
that are classical in the HMM literature. 
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2.3 About condition (Al) 



Here we assume that X is compact, and that the transition kernels Qg are absolutely con- 
tinuous with respect to a measure /i such that /i(A') — 1, for all in a neighborhood of Oq. 
We denote qe{-^ ■) the density of Qe with respect to /i for6' in this neighborhood, and define 



cr_ {9) = inf , 



• {x,x') , a+{9) = sup qe{x,x'). 



Then, by Corollary 1 of ( Done et al. . 2004) . it is possible to set Rg — 1 and pg = (l — a^[g] ^ 
Also, following the proof of Lemma 2 of (jPouc et all 120041 ) we find that, if pg^ > 1, then 

^n{eo)-^n{eo.xn) < 

To verify assumption (Al), assume that there exists a subset V C & containing 6*0 such that 
the densities q g(x,x') a.ndqg(y\x ) are smooth as functions of 9 on V (i.e. satisfy assumptions 
(A6)-(A8) of (jPouc et al.U2004l) on V ) and such that 



inf / 

9ev 



> 1. 



(5) 



Then, for any 9, x, xq, 

C {9, x) ~ (00, xo) = C {9, x) - (0, xo) + C {9, xo) - In (^o, x^) (6) 
and following the proof of Lemma 2 of ( Douc et al.L l2004l ) gives that, if (AO) and ([5]) hold. 



sup sup (6i,x) - (6',xo)| < 2sup 



Pe 



Now for 61 e F, 

tn {9,Xo)-in {9o,Xo) 



'o) VeC (90, xo)+ I (9 - ^o) D^gi^ {9o + u{9 - 9o),xo) (9 - 9o) il-u)du. 

(7) 



Following Theorem 2 in (jPouc et all . l2004l ). n^^/^y^^^^^^/ 

, x) converges in distribution 

under Pfl„ to A/"(0 , Vh) for some positive definite matrix Vb, and following Theorem 3 in 

(jPouc et all . 120041 ) ■ we get that supggy n~-^Z)g£„(0, xq) converges Fg^ a.s. to Vq- Thus, we 

may set: 

S„ = {9eV;\\9-eo\ 

so that 



sup P9„[C(0,X) 
(0,a;)eS„ 



< 1/V"} X X 
9o,xo) < -C„] = o(l) 



follows from © and ([T]). The second part of (Al) is then satisfied as soon as 7r(5'„) > rT^/"^ 
which is true for instance if y is a neighbourhood of 9^ and if the prior has a density with 
respect to Lebesgue measure, which lower bounded by a positive constant on V . Note that 
the freedom in the choice of V implies that (Al) can be verified in situations where the true 
distribution can be approximated by Pg such that pg is arbitrarily close to 1, as long as it 
is possible to choose paths in 9 to approximate 9q that avoid such pathological 9''s. This is 
illustrated in the case of finite state space HMMs in the following section. 



3 Finite state space 

Here we assume that X = {l,...,fc}. We may take p as the uniform probability measure 
on {1,...,A;}. We first describe the setting in this case, and then we prove that, under 
some general assumptions. Theorem [T] applies. Moreover, we prove how some choices of the 
prior give posterior concentration rates for the finite marginals without additional mixing 
coefficient. The results obtained in Section 13.11 are valid both when the true distribution 
has ko = k different states and when it has a smaller number of states. 
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3.1 Posterior convergence rates for the finite marginals 

Qe denotes the transition matrix {qij)ij<k and 9 — (qij, l<i<fc,l<j<fc — 1; 7i, 7fc) 
with 7j € r C R'^ such that ge{y\x) = g^^{y), x ^ X — {!,..., fc} for some family of 
probabihty densities {g'y)^ev with respect to v. We denote Qk the parameter space. 

Let A^fc be the set of aU possible probability distributions of iYn)n>i under for all 
9 e Qk- We say that the HMM has order fco if the probability distribution of {Yn)n>i 
under Pg is in AAko a-nd not in for all k < kg. Notice that a HMM of order fco may 
be represented as a HMM of order k for any k > kg. Let he a kg x fep transition 
matrix, and (72*, 7°^) S T'^" be parameters that define a HMM of order fcp- Then, 
= <i,j < fc;7?,....,7^^, ■ • • ,7°o) e ©fe with Q = (g^,! < i, j < fc) such that : 

qtj^qtj hi <ka 
fc fc (8) 

5I*'"'?»fco «< fco, and ^qa=ql^^,^, i > ko 

I — ha l — fiQ 

gives Pg = Peg. Indeed, let {Xn)n>i be a Markov chain on {1, . . . , fc} with transition matrix 
Q. Let Z be the function from {1, . . . , fc} to {1, . . . , fco} defined by Z{x) = x ii x < ko and 
Z{x) = fco if X > fco- Then (Z(X„))„>i is a Markov chain on {!,..., fco} with transition 
matrix Q'^. 

In the following we parametrize the transition matrices on {1, . . . , fc} as {qij)i<i<k,i<j<k-i 
(implying that qik = ^ — X]j=i fo'" * < fc) and we denote by the set of probability 
mass functions = {(ui, . . . , Wfc-i) : mi > 0, . . . , Wfc-i > 0, — !}• We shaU also 

use the set of positive probability mass functions A". — {(wi, . . . , Ufc_i) : ui > 0, . . . , Wfc_i > 
0,Y.\zl Ui < 1}. Thus, we may set 6^ = A^ x T''. 

Any Markov chain on {!,..., fc} admits a stationary distribution, if Qg admits more 
than one stationary distribution, we choose one that we denote fig. Besides (HJ) holds with 
Rg ~ 1 and 

so that and as soon as the transition matrix Qg has positive entries, pg < 1. For any 
= {qijA < » < fc,l < j < fc - l;7i, ■•■•,7fe) e Ok, any y = (yi, ...,yi) in y, 

fi.eiy)^ Me(ii)9n»2 •••9i!-i»,57n (yi)'"ff7., (yO- (9) 

l<2l ,. . . ,Z/ <fc 

Let 7r(Mi, . . . , M/c_i) be a prior density with respect to the Lebesgue measure on A^, 
and let a;(7) be a prior density on F (with respect to the Lebesgue measure on M.'^). We 
consider prior distributions such that the rows of the transitions matrix Q are independently 
distributed from tt and independent of the component parameters 7^, i = 1, fc, which are 
independently distributed from w. Hence the prior density (with respect to the Lebesgue 
measure) is equal to tt^ = tt^'^ (g) uj^''. In this section we use a Dirichlet type prior, see 
assumption (Fl) below, or an exponential type prior, see assumption (FEl) below, on the 
transition parameters {qij,j < fc). 

Theorem 2 Let 9q = {q^j, 1 < i < fco, 1 < j < fco — 1; 7°, ....,7"^) £ Qko be the parameter 
of a HMM of order fco < fc. Assume that 

• (FO) q% > 0, 1 < ?; < fco, 1 < j < fco 

• (Fl) TT is continuous and positive on A^, and there exists C, ai > 0, . . > such 
that (Dirichlet type priors): 

fc-i 

V(ui,...,wfe_i) e AO, ufc- l-^u„ 0<^(Mi,...,Ufc_i) <C<i-i---<'=-i 

i=l 
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and ui is continuous and positive on V . 

• (F2) The function 7 1— g-yiy) is twice continuously differ entiahle in T, and for any 
7 e r, there exists e > such that 



/ sup I 



\\V ^ log {y)\\^g^{y)v(dy) < +00, / sup \\D'i\ogg^> {y)\\^ (y) v(dy) < +cc,, 



7'6-Bd(7,e) 

II SUPyes47,e) ^7.97' (y) II e ^iC'^) II SUPygB<i(7,e) ^7^7' (2/) II ^ ^ll'^) 

• (F3) There exist a > and b > such that 



sup 

||7||<™'> 



Then, there exists K large enough such that 

Yi:n 



: \\fi,e ~ fiMpe - I) > K 



OPfl„ (1) 



where pe ^ ( 1 — 7=1 inf i<i<fe Qij ) • ^/ moreover a :— J2i<i<k ^(^ ~ 1 + c'); ^^^e^ 



|/i,e-//,eolli > "^Kn 



■(log 



OPe„ (1) • 



If we replace (Fl) by 

• (EFl) IT is continuous and positive on A^, and there exists C such that (exponential 
type priors): 

fe-i 



1=1 



< TT (ui, . . . ,Uk-i) < C exp(-C/ui) • • • exp(-C/ufc) 



and uj is continuous and positive on T , 
then there exists K large enough such that 



Wfi.e ~ fi.eA\i>2Kn-^/\\ognf'^ 



OPec (1) ■ 



The proof is presented in Appendix 14.21 The idea behind the proof is that large values 
of Q!i's allows to avoid slow mixing Markov chains, though keeping them in the modelling. 

Theorem [2] provides guidelines to choose the prior. Indeed, if a Dirichlet T>{ai, ...,afc) 
prior is considered on each row of the transition matrix of the hidden Markov chain, then 
choosing large enough values for the aj 's ensures a consistent posterior distribution in terms 
of the Li distance on the marginals. If one chooses exponential type priors, it is possible 
to get, up to a logn factor, the posterior concentration rate of order l/^/n for the finite 
marginals. Interestingly, this i s quite different from wha t happ ens in the case of overspeci- 
fied mixtures as described by (jRousseau and Mengersenl . boill ). In the case of independent 
mixture models, the posterior distribution concentrates at the rate around the true 

density of the observations (in terms of t he Li distance) under very general conditions on 
the prior. In ( Rousseau and MengersenL 201l[ ). the authors prove that by choosing small 
values of a^, the posterior distribution concentrates on the configuration where the extra 
components are emptied, which is desirable since it leads to an interpretable posterior distri- 
bution. Here the story is quite different because to favour empty components (small weights 
in the stationary measure fJ-{0)) corresponds also to favour slow mixing Markov chains, i.e. 
Qe's such that p{d) is close to 1. Then the asymptotic behaviour of the likelihood is much 
less stable and it is not clear that the posterior will concentrate on the correct densities 
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fi^Bo- Hence, to be able to interpret correctly the posterior distribution it is more desirable 
to choose large values of a^. We do not claim however that the threshold k(k — 1 + d) on 
a :— X]i< j<fc sharp. Our intuition is that it is probably enough to assume, at least 
when ko — k — 1, that a > fc(fco — 1 + d), which corresponds to the number of constraints 
involved in the construction ([5]), but we have not been able to prove it, except in the case 
k — 2, see Section 13.51 Although a posterior concentration in terms of the marginals is 
useful, when interest lies in fitting the model or in prediction, in some applications it is also 
interesting to recover the parameters correctly. In the following sections we use Theorem 
to recover all the HMM dynamics, that is to recover the number of hidden states, and, 
given that the number of hidden states is known, to recover the parameters. For this, we 
need to understand what the posterior concentration result says about the parameters. A 
key point is to understand the geometry of the set of parameters such that marginals are 
close, which is done below. 



3.2 Distance between marginals and distance between parameters 



To recover conditions on the parameters from conditions on the ^-marginals fi^e, we need 
an inequality that relates the L j distance of the ^-marginals to the parameters of the HMM. 
Such an inequality is proved in (jGassiat and van Handel [2012h for translation mixture mod- 
els, with the strength of being uniform over the number (possibly infinite) of populations 
in the mixture. However for our purpose, we do not need such a general result, and it is 
possible to obtain it for more general situations than families of translated distributions, 
under a structural assumption implying, in particular, the weak identifiability of the mul- 
tidimensional mixtures. Before setting the structural assumption let us set the inequality 
relating the Li distance of the ^-marginals to the parameters of the HMM. 
The inequality following Theorem 3.9 of (jGassiat and van Handei 120121) says that there 
exists a constant c{9o) > such that for any small enough positive e, 



\\fi,9 - fLOoWi 



c{0o) 



l<i<;:Vi,||7j-7f ||>e 



+ [imXiae A{h,...,^l))~Veo{Xl.,l^il■■■^l)\ 

l<ii,...,ii<ko 



J2 PoiXi:i=n---ji) 



iji.,...,j,)eA{ii,....ii) 




Iji - ll I 
' into /cq 4- 



(10) 



< e}. The above lower 
1 groups, where the first 



where A(ii, . . . , i/) = {(ji, . . . , j;) : - 7,*; || < e, . . . , 
bound essentially correponds to a partition of {1, . . . , fc} 
fcg groups correspond to the components that are close to true distinct components in the 
multivariate mixture and the last corresponds to components that are emptied. The first 
term on the right hand side controls the weights of the components that are emptied (group 
fcg -f 1), the second term controls the sum of the weights of the components belonging to 
the «-th group, for i = 1, . . . , fcg (components merging with the true i-th component), the 
third controls the distance between the mean value over group i and the true value of the 
i-th component in the true mixture while the last term controls the distance between each 
parameter value in group i and the true value of the i-th component. 

Let us now introduce a general assumption under which (|10p will hold. For this we need 
to introduce some notations. For all £ < n, for all / = {ii,...,ii) G {1, • • • ,kY, define 
7/ = (7ii , • • ■ , 7i^ ), G-yj = nt=i 57it (y*)' D^G~fj the vector of first derivatives of G^j with 
respect to each of the distinct elements in 7/, note that it has dimension d x \I\, where \I\ 
denotes the number of distinct indices in /, and similarly define D^G^j the symetric matrix 
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in ^''l^l made of second derivatives of G^^ with respect to the distinct elements (indices) 
in 7/. If 6 is a vector, 6-^ denotes the transpose vector. 

Let T = {t = (ti,...,tfej e {l,...,fc}'=° : U < U+i,i = 0,...,fco - 1}. For any t = 
{ti, . . . ,tkg) e T, define for all z g {1, . . . , ko} the set J{i) ~ {ti-i + 1, . . . ,ti}, using tg = 0. 
We then consider the following condition : 

• Condition {L{£)) For any t = {ti, . . . ,tkg) € T, for all collections (tt/)/, (7/)/, / ^ 
{1, . . .,tkoY satisfying tt/ > 0, 7/ = (7^1, . . . ,7^^) such that 7^^ = 7? when ij e J{i) for 
some i < ko and ji- e F \ {7°, i — 1, . . . , fco} when ij ^ {1, . . . ,tkf,}, for all collections 
{ai)i,[ci)i,{bi)i, I e {l,...,fco}^ a/ e M, c/ > and 6/ G M'*!^!, for all collection 
of vectors e R"*!-^! with / e {l,...,fco}^ and J G >/(ii) x ••• x J(i£) satisfying 
Ik/., /II = 1, and aU sequences (a/., 7), satisfying a/.j > and /e j(ii)x--x "-f--' = 

/^{i,...,tfco}« /e{i,...,fco}' ^ 

+ ^ c/ ^ ai,jzJjD'^G^ozi,j = 

/£{!,. ..,feo}f JeJ(ii)x-xJ(if) (11) 

5^ ^/+ E (l«/| + 11^/11 +c/) = 

This condition is a multivariate version of the condition that would be required if only 
£ — 1 was considered. In this case the condition can be written as : 

• Condition (L(l)) For any t = {ti, . . . ,tkg) € T, consider (/Ti)^^*''" (if tfco < A:) a 
set of nonnegative reals, {ai)^!^^ and {bi)'^!^ with € R and hi € R'' and (ci)t:i 
and Zi,j,aij,i = l,...,fco,j = l,...,ti — ti^i, with to = and Zij G M.'^ satisfying 
||zij|| = 1 and aij > and ctij — 1, for any which belongs to 

r\'{7<',z = i,...,fco}, 

E + E {"■'■9^0 + bjD^g^o'j + ^c^ E '^hj^L^^9j°^'-'3 = 0, (12) 

i—1 i—1 i—1 3 — 1 

if and only if 

ai = 0, 6j = 0, = V« = l,...,fco, Vj = 1, . . . tt^ = Vi = 1, . . . , fc-ifc^ 

Note that the partition represents the clustering structure of the extra components, 
up to a permutation of the labels. 



Condition (L(l)) is the same condition as in (jRousseau and Mengersenl . 120111 ). so that 
it is satisfied in particular for Poisson mixtures, location-scale Gaussian mixtures and any 
mixtures of regular exponential families. We now have: 

Proposition 1 Assume that the Junction 7 g-y{y) is twice continuously differentiable in 
F and that for all y, g~i{y) vanishes as ||7|| tends to infinity. Then, if condition {L(£)) is 
verified, (jlOp holds. Moreover, condition {L{£)) (£ > 1) is verified as soon as condition 
(L(l)) is verified. 



3.3 Application to the estimation of the number of hidden states 

We will now define a Bayesian estimator for the number of hidden states. Let (M„)ri>i and 
{vn)n>i be sequences of real numbers tending to as n tends to infinity. For any e 0, let 
J [9) be the set 

J(0) = {j : P0(Xi -j) 
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For any j € J {9), let 

A, (9) = {zeJie) : Pe {X, - j) - < Vn} ■ 

We now say that the elements ji and j2 of J [9) are merging and we note ji ^ j'2 if there is 
a sequence ii, . . . , v of elements of J (6*) such that ii = ji, v = j2 and (0) n Ai^ (9) 7^ 0, 
. . . , Ai^ -^ (9) n (0) 7^ 0. We finally define L (9) as the number of equivalent classes with 
respect to this equivalence relationship. 

The following corollary says that the posterior distribution of L (9) concentrates on the 
true number fco of hidden states. An estimator of the order can then be for instance the 
posterior mode of the distribution of L (9), however the whole posterior distribution is also 
of interest. 



Corollary 1 Let the assumptions of Theorem\^ hold, and assume that condition {L(i)) is 
verified. Assume moreover that Vn / Un — o(l) andwn/un = o(l) asn tends to infinity, where 
Wn = n"^^i<'<'="'"'''^''"^+'''^/*^^i<i<''"')(logn) if the Dirichlet type prior assumption (Fl) 
holds and Wn — n^^/^(log n)^/^ if the exponential type prior assumption (FEl) holds. Then 

^^-[9 : L(0)^fco|>l:„]=op,Jl). 

Let us prove the result. Notice that for some constant C > 0, 

I31 



7° 

hi 



In 



7° 



>c\ 



In 



V 1 1 7,, 



7° I 

hi I 



that using Proposition [T] we get that for maybe some other constant c{9q) > 0: 



\\fi,e - fiAiWi 



> 



E PeiX^=J) 



£{9o) 

l<i<':Vi,||7j-7f ||>e 

J2 I E mx,..i = ji---ji)-noiXi.,i = ii---ii)\ 

l<zi,...,ii<fco {ji,...,ji)eA{ii,...,ii) 
ko , fco 

-Ell E iP4^i=j)(7. -7^)11 + 2E E niX, = j)h,-i 

»=1 i:||7j-7fll<e »=1 j:||7j-7f ll<e 



It follows that as soon as 

\\fL9 - fl.BoWi ^ Wn 

we get that, for any j G {1, . . . , fc}, either Pg (Xi = j) < u„, or 
3* e {1, . . . , fco} Fe (Xi = j) II7, - 7°lP < 

This means that extra states are emptied or merge with true ones. The corollary follows 
easily from Theorem [2] and the definition of L{9). 



3.4 Consistent estimation of the parameters when the order is 
known 

When the model is correctly specified, that is when the true number of states fco is equal 
to fc, we are able to refine the concentration result in two ways : (i) obtain the usual ^/y/n 
concentration rate and (ii) obtain a posterior concentration rate in the parameter scale, as 
soon as the prior vanishes quickly enough near qij =0, i ^ j. 
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Corollary 2 When k = ko, assume that the assumptions of Theorem hold and that 
condition {L{i)) is verified. If the prior is of Dirichlet type with a = X)i=i '^i ^ ^(^ — 1 + d) 
or if the prior is of exponential type, for any sequence Mn tending to infinity, 



and 



M 

\\fi,o,-fLe\\i<^\yi: 



M, 



1 + op.„ (1) 



1 + on, (1) • 



Proof of Corollary [5] is given in Section H3] The idea behind the proof is that, if fc = fco, 
TO)l gives 



Wfi.e- fiM\\i>c{eo) E \^o{Xi., 

l<ii,...,ii<kQ 



so that, if, for some sequence m„ tending to 0, 

II//, e — flfioWi ^ ""n 

we obtain that, for all ii ■ ■ ■ ii, and large enough n, 

\¥e {Xi.,i^ii---ii)-Peo (^i:/ - H 
which means in particular that for all i,j, 

\Pg {Xi =i)- Pe„ {Xi = ^) I < u„, and 



Since q^^ > for all i,j < k, then there exists a > such that for all i,i < k, qi j > a and 
Pg {Xi = j) > a. But p{6) — 1 > X]j=i mini qij, hence in this case p{6) — 1 > fca > 0. 

3.5 Asymptotic behaviour of the posterior distribution when = 1 
and k = 2 

In this section we restrict our attention to the simpler case where ko — 1 and k ^ 2. We will 
see that despite its apparent simplicity, the asymptotic analysis of the posterior distribution 
leads to a guideline on the choice of the prior param eters a^'s which is (almost) opposite 
to that proposed in (jRousseau and Mengersenl 120111 ) . in the case of mixture models. We 
still consider situations where (jlOp holds, and choose independent Dirichlet type priors for 
the rows of the transition matrix. We prove that the extra component is not emptied but 
merges with the true one, under large enough a^'s for the Dirichlet prior. 
When fc = 2, we can parameterize 6 as 9 = {p, q, 71, 72), with < p < 1, < g < 1, so that 



Qe = 



1-p p 
q 1-q 



IJ-e = 



^p + q p + q, 

when p 7^ or (7 7^ 0. If p = and q = 0, set fig = (i, i) for instance. Also, we may take 

pg-l = ip + q)Ai2-{p + q)). 

When ko — 1, observations are i.i.d. with distribution g-^odv, so that one may take 9q = 
(p, 1 — p, 7°,7*') for any < p < 1, or 9q = (0,5,7", 7) fo'' < g < 1 and any 7, or 
^0 = {p, 0, 7, 7") for any < p < 1 and any 7. Also, for any x £ X, ^Bo.x — P^o and 

in {e,x)~ in {00 ,Xo)^in{0,x)-en{eo,x). 

We take independent Beta priors on (p, q) : 



n{p,q) = a,^p"-l(l -p)^-lg"-l(l 
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thus satisfying (Fl). 

Let Un = ^ with M„ tending to infinity. We shall prove that P'' (i3„|Yi:„) = f + op(f), 
as soon as assumption (F2) and (F3) hold and as soon as a > 3c?/4 and /3 > 3(i/4, where 
Bn is the set 

Bn ^{{p + q)A{2~{p+ q)) \\f2M - /2,eolll < Un} ■ 

Then, for any sequence of sets (^„)n>i, for any D > 0, and for any sequence (C„)„>i of 
real numbers 



Ee„ [P" (A„|ri:„)] = Eg,, [P^ {An n Bn\Yi.,n)] + O (1) 



Ef, 



0(1) 



Eqo 



o(l) 



< 



Thus, if 



one gets 



(j-^n ^ Cn 



-D/2 \ _ 



0(1), 



7r2(A„nB„) + o(l) 



Eg,[P"' {An\Yi.,n)] < ^2 (A„ H B„) + O (1) 



(13) 
(14) 



Let e„ decrease to in such a way that tends to 0. Consider the set 

An - 



p q 

< £„ or < e. 



p + q 



p + q 



Then the following holds: 



Corollary 3 Assume that the assumptions of Theorem\^ hold and that condition {L{i)) is 
verified. If moreover for all x, j t-^ g-yi^) is four times continuously differ entiahle on T , and 
if for any 7 S F there exists e > such that for any i — 1,2, 3, 4, if Dl^g^i denotes the i-th 
differential operator (with respect to "/) of g at point 7', 



/, 



sup II— (y) W'^g^ (y) v{dy) < +00, 

I'eBaif.t) 9-1' 

the extra component cannot he emptied at rate en, that is 

P" (A„|Yl:„) =Op(l) 

as soon as a > 3d/4 and (3 > 3d/4. 
To prove Corollary [3l we prove that 

7T2{An n Bn) < U^" + + ui+'^/^f-''/^ , 



(15) 



(16) 



and that (|13p holds with D ^ d + d/2 as soon as C„ tends to infinity as n tends to infinity. 
Thus using ([13]), Corollary |3] follows. The detailed proof is given in the appendix. 
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4 Proofs 



4.1 Proof of Theorem [T] 



The proof follows the same lines as in (jGhosal and van der Vaaru 12006^ . Let (e„)„>i be a 
sequence of positive real numbers. We write 



\\fi,e - fi,eo\\ 



1 



2Rg + pg-l 



where An ^ {0 : — fi^So II 1 2Rg+pl-i — ^"}- ^ lower bound on Z3„ is obtained in the 
following usual way. Set for any real number C, il„(C) — {{9, x); f„(0, x)-'£n{Oo, xq) > ~C}, 
which is a random subset of Q x X (depending on Yi:„). 

Dn > [ f[n^ic)e'"^'-''^-'-^''"'^°K{de)7rxidx) 
Js„ 

therefore using (Al), there exists ci > such that for any sequence (C„)„>i of real numbers 
tending to +oo, 



Dn < cie "n 



< ¥g,[Dn<e-^-n®Trx{Sn)/2] 

J^^ Peo [inie,x)-£ni9o,xo) < -C\]TT{d9)TTx{dx) 



< 2- 



oil), 



by assumption (Al) again. 

Thus, for any sequence (C„)„>i of real numbers tending to +oo, 



Ifi.e — /;,eol 



- 1 



2Re 



> en\Yi..r 



But 
and 



„£„(e,x)-£„(eo,^o)^((^5,)^^ (dx) 



Eg 



{A„nJ^^}xX 



Mie.x)^e.ieo,xo)^(^^Q-^^^ (dx) 



= o[^(A„nj-;^)] = o(n-^/2) 



by Fubini's theorem and the fact that, by (Al), £n{9o) — £n{9Q,xo) is imiformly upper 
bounded, so that by taking C„ tending to +oo slowly enough. 



\fi,e - //,0ol 



1 



2Rg + pg-l 



OPe,, + -j=r^Dr^>cin-o/2e-c„ 



where 7V„ = /(^„n^„)x;t e^"(«'-)-^"(«°^-°)7r(d0)7r;t [dx). Let now (0j),=i,...,w, N = N{5, J",,, d,(., 
be the sequence of Oj 's in Fn such for all 9 G J>i there exists a 0j with di {9j ,9)<d (for 
some S to be fixed later). Assume for simplicity's sake and without loss of generality that n 
is a multiple of the integer I , and define 



0, - V./ 



Eri;(ii(i',._,+i,...,VH)eA,-Pf-o(('>'i.-.>'i)e^j)j>t, 
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where 

Aj = {{yi,...,yi) ey xy : fi,0o{yi, . . . ,yi) < fi,e,{yu ■ ■ ■ ,yi)} 
for some positive real number tj to be fixed later also. Note that 

Pe, ((n, . ..,YOe A,) - Pe„((Fi, . ..,Yi)e A,) = ^\\fi^g^ - 



Define also 



Then 



i<j<N:ejeA„ 



■^«o Tr^« ^ Eeaipn < N{d,Tn,d{., .)) . max 



and using the usual equality 



i<j<N:ejeA„ 



Eg„ (iV„(l-^„) 

so that: 



X 



Ee,. ((1 - ^n)) TT (dd) TTX (dx) 



\\fl,e - /i,9o H 1 r, p 7 - ''n\Yl:n 

2Rg + pe - I 



< 



o(l) + (?) 

O n^/^e^" 



M 



max Eg„(j)n 
i<j<N:ejeA„ ■' 



(17) 



Now 



E - Fe„((ri, . . . , 11) € A,)) > t, 



and 



n/l 



n/l 



> -*J + E • ■ • : ^h) € Aj) - Fg„{{Yi, ...,Yi)€ Aj)) 



1=1 



Consider the sequence {Zi)i>i with for all i > 1, ^ {Xu_i+i, . . . ,Xii,Yu^i+i, . . . ,Yii), 
which is, under Pe, a Markov chain with transition kernel Qg given by 

Qg(z,dz') = gg{y[\x[) ■ ■ ■ gg(y[\x'i)Qg{xi,dx[)Qg(x[,dx2) ■ ■ ■ Qg{x',_j^,dx[)fi{dy[) ■ ■ ■ n{dyl). 

This kernel satisfies the same uniform ergodic property as Qg, with the same coefficients, 
that is condition holds with the coefficients Rg and pg with the replacment of Qg by 
Qg, and we may use ( Ricj 2000)'s e xponential inequality (corollary 1) with uniform mixing 
coeficients (as defined in (jRiol . 20001 )) satifying (/)(fc) < Rgpg^ , to obtain that, for any positive 
real number u. 



iY,^_,^,,...,Y,^)eA,-^e,m,...,Yi) e A,)) > 



i=l 



< exp ■ 



^2lu'{pe„^lf 
n {2Rg„ + pg„ - ly 



(18) 



15 



and 



n/i 



Set now 



- fi,eo\\i c _<^r, 



< 



cxp ■ 



-2lu^ [pe - ly 
n{2Re + p0-lf 
(19) 



41 



Eeo < exp • 



(20) 



Since for any 9, 2b'^+pI-i — ^ ^^'^ since consequently for 9j e An, Wfi.ej ~ fi.OoWi ^ ^n, we 
first get, using (flSI) . 

/ -"£n (Pflo - 1)^ 1 
[ 81 {2Re, + pe„ - if J 

Now, for any 6 € v4„, 

- + ^ {¥gA{Yu-i+u ■.■,Yu)e A^) - Pe„((ri, . . . ,yO e ^,)) 
1=1 

= -I^^^^i—^ + ^ {P,^.((y„ . . . ,yo e A,) - P.„((ri, . . . ; y,) e ^.)} 



+ - {P9((yi, . . . , y,) e ^j) - Pe, ((1^1, ... , e A,)} 



n/l 



J2 i^eAiYu-i+i,- . .,>;.) e A,) - Pe((n, . . . , i1) G A,)) 



> 



n\\fi,e, - fi.0o\\i n-\\fi.ej -~ fi,e\\ 



n/l 



41 



I 



i=l 



> 



A\fi,9 - fi,eo\\i 5"ll/i,9, - /i.elli Repe 



41 

- 4l\ 41 



41 

II f f II -^gPg -> ("^ 



-1- \ 4l 



1 - 



41 / 2e, 



\.fue - //.So I 



Wfi.e—fi.OQ II 1 Pa — 1 



using the triangular inequality and the fact that — < < 4] 2_R8+p£)-i 

since 9 & An and 2fl^+p^^-i < L As soon as ^ (l - ^) - 27;;- > 0, (dH) gives, for 9 € A„, 



Ee,x (1 - 0i) < exp |- 
We finally get, using ((T7)) . ((20)) and ((2T|) 

jOfl - 1 



2/ /n 



Wfifii ~ fi,eo\\ 



2Rg 



- 1 



> en\Yi.,, 



41 / 2e„ 



< 0(1)+ can^/^e'^" exp 



(21) 



21 f n 
41 



M 

^) exp 



/ -'^en (pep - if 
\8li2Re,+peo - if 



for some C2 > 0. Taking e„ = and C„ tending to +00 slowly enough, it is easy to 

see that 



2Rg 



1 



= 0(1) 



as soon as K is large enough, and the first part of Theorem [T] is proved. 

Assume now that Assumption (A4) holds, let Af„ tend to infinity and take e„ = Mn/\/n. 
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By writing A„n J>i — lJm>i ^n,m(en) and using same reasoning, one gets, for some positive 
constant c: 



1 



2i?e 



o (1) + n^/^e^" ^ (A>,m(e«)) exp {-cm^ne^ } 



m>l 



^ ^ (^:^n,m(en),d/(---)) cxp { -cm^?ie^ } 



771 > 1 



o(l) 



and the second part of Theorem [T] is proved. 



4.2 Proof of Theorem H 

The proof consists in showing that assumptions (A0)-(A3) of Theorem [1] are satisfied. As- 
sumption (FO) and the construction ahow to define a 6*0 G 0fe such that (AO) holds with 
D = k{k-l) + kd. Then using (Fl), (F2) and the computations of Section[131 (Al) holds. 
To prove that (A2) and (A3) hold, recall that iiO = [qij, 1 < i < fc, 1 < j < fc — 1; 71, 7fc) 
is such that {qij, l<i<fc, — 1)S A§, then is uniquely defined. Let us now 

define 

-T^n = = (gij, 1 < « < fc, 1 < j < fc - l;7i, ....,7fc) : % > «„, 1 < z < fc, 1 < j < fc. 



Then, using (F2) together with (|9]) and Lemma [T] in Appendix 14.31 we obtain that for some 
constant B, 

V(0i,02)e^,i \\fe,-fe,\\,<B(j^^ 



so that for some other constant B, 



B ( 1 



„2c 



-1 /c(/c-l)+A:d 



and (A3) holds if Vn is larger than some negative power of n. Now, (Fl) gives 

7r(J^„) = 0(w„ - - +U„ - - ). 

Let then = n-^/2™"i<'<'= "-/v^b^ and u„ = n"-°/^^i<'<'= "'/VIogTI. Then, (A2) and 
(A3) hold. Thus, Theorem [T] implies that 



(1) = P^" 



logn 



Yl:r. 



e Tn and Wfi^g - fi,e,,\\i{pe - I) > K 



logn 



l:n 



-OPe„ (1) 



Since pe - I > J2i=i mini<,j<fc qij, for aU 9 e Tn, pe - I > Un 



\\fi,e~fi,eMp0-^)>K 



logn 



> 



\fLe~fLe„\\i>2K- 



1 /logn 



and the theorem follows when (Fl) holds. If now (FEl) holds instead of (Fl), one gets, 
taking u„ = w„, 

TT{T^)^0{v,,eM~C/vn)). 

Then, taking t;„ — 1/hlogn with small enough h gives that (A2) and (A3) hold. The end 
of the proof follows similarly as before. 
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4.3 Derivatives of the stationary distribution : Lemma [T] 

Lemma 1 The function iig is continuously differentiable in (A^)''' xF'' and there exists 
an integer c > and a constant C > such that for any 1 < i < k, l<j<fe — 1, any 
m — 1 , . . . , fc , 



due (m) 



< 



C 



(infi/^jv q, 



i'j' 



\2c- 



One may take c 



1. 



Let 9 — [qij. 1 < i < fc, 1 < j < fc — 1; 7i, ....,7^) be such that (g^j, 1 < i < k^l < j < 
fc — 1) G Aq, Qg — [qij, 1 < i < k,l < j < k) is a k X k stochastic matrix with positive 
entries, and fie is uniquely defined by the equation 



tJ-eQe 



T 
tJ-0 



if lie is the vector (/i6i("^))i<m<fe- This equation is solved by linear algebra as 



fie (to) 



Pra{q^J,l< » < fc, 1 < j < fc - 1) 

Riqij, l<i<k,l<j <k~l) ' 



k-l 



TO = 1, . . . , /c - 1, fie (k) ^ I - fie (to) , 



(22) 

where P„i, I — 1, ... , k~ 1 and R are polynomials where the coefficients are integers (bounded 
by k) and the monomials are all of degree k — 1, each variable qij,l<i<k,l<j<k — 1 
appearing with power or 1. Now, since the equation has a unique solution as soon as 
{Qij: ^ < i < k,l < j < k — 1) & Ag, then R is never on Ag, so it may be only at 
the boundary. Thus, as a fraction of polynomials with non zero denominator, 9 t-^ fig is 
infinitely differentiable in (A^)*^ x F*^, and the derivative has components all of form 

Piqij, I <i < k,l < j < k - I) 
R{qi], l<i<k,l<j <k-l)^ 

where again P is a polynomial where the coefficients are integers (bounded by 2k) and the 
monomials are all of degree fc — 1, each variable qtj , 1 < i < k, I < j < k — I appearing with 
power or 1. Thus, since all qij's are bounded by 1 there exists a constant C such that for 
all to = 1, . . . , fc, « = 1, . . . , fc, J = 1, . . . , fc — 1, 



dfie{m) 



< 



C 



R{qij, l<i<k,l<j <k-iy 



We shall now prove that 

R (qij, l<i<k,l<j<k~l)>{ 



inf q^j) 



k-\ 



(23) 



(24) 



which combined with (j23D and implies Lemma [TJ Note that we can express i? as a 
polynomial function of Q = g^, 1 < i < k,l < j < k,i j . Indeed, fi :— (fie{i))i<i<k-i is 
solution of 

fi^ ■M = 

where V is the (k— l)-dimensional vector {qkj)i<j<k-i, and M is the (fc— 1) x (fc— l)-matrix 



with components Mi,j 
for any k > 2 : 



qkj — q-ij + lli=j- Since R is the determinant of M, this leads to, 



R= E ^(-) n 

l<i<k-l,a-{i)=i 



qki 



l<j<k-lj^i 



n (.ik^ 

l<i<k-l,<7{i)^i 



qa{i)i) (25) 



where for any integer n, 5„ is the set of permutations of {1, . . . , n}, and for each permutation 
a, e (a) is its signature. Thus i? is a polynomial in the components of Q where each monomial 
has integer coefficient and has k — I different factors. The possible monomials are of form 

/3 n ^fe^ n 
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where {A, B) is a partition of {1, . . . , A; — 1}, and for all i & B, e {1, . . . ,k — 1} and 
^ i. In case B — the coefficient fi of the monomial is X^o-eSs. i ^ ("') ~ so that we 
only consider partitions such that _B 7^ . Fix such a monomial with non nul coefficient, 
let {A,B) be the associated partition. Let Q be such that, for all i G A, qki > 0, for all 
i ^ A, Qki = and quk > (used to handle the case A = 0). Fix also qij(i) — 1 for all 
i E B. Then, if {A' , B') is another partition of{l,...,fc — 1} with i?' 7^ 0, the monomial 
riiGA' Iki IliGS' lijii) = 0. Thus, R{Q) equals Hig^ Iki Dies times the coefficient of 
the monomial. But R{Q) > 0, so that this coefficient is a positive integer and (1^ follows. 



4.4 Proof of Proposition [T] 



To prove the first pa rt of the Proposition we f ollow the ideas of the beginning of the proof 
of Theorem 5.11 in ( Gassiat and van Handel, Soli). If ^ does not hold, there exist a 
sequence of /-marginals {fi,e^)n>i with parameters (0")„>i such that for some positive 
sequence e„ tending to 0, — /f,eolli/-^n(^") tends to as n tends to infinity, with 



E 



l<j<f:Vi,||7j-7f'||>e„ 



E 



E 

l<ii,...,ie<ko I {ji,...,ji)€A„(ii,...,ii) 



n {Xiu = ji • • • ji) ~ Fe, {Xia = • • • if ) | 




+ PeiXi.,e^ji...je) 

Ui,...,je)eA„{iu...,it) \^ \ 7j 

+ 1 E ^9{X^.,e^ji---ji) 

with A„(ii, ...,it) = {(ji, . . . , if) : hji - 7" II < £n, ■ ■ ■ , hjf - 7^° II < 
Now, = E/e{i,...,fe}']Pe"((^i:---,^^ = ^)G7r where r = (Q", (7?, • • • , 7^)), a 
transition matrix on {1, . . . , fc}. It is possible to extract a subsequence along which, for all 
i ~ 1, . . . , fc, either 7^ converges to some limit 7^ or ||7"|| tends to infinity. Choose now the 
indexation such that for i = 1, . . . , ti, 7" converges to 7°, for i = ii + 1, . . . , t2, 7^ converges 
to 72 , and so on, for z = + 1, . . . , tfe^ , 7" converges to 7^^ , and if tko < k, for some k < k, 
for i = tk„ + 1, . . . , fc, 7j" converges to some 7^ ^ {7°, . . . , 7°^}, and for z = fc + 1, . . . , fc, 
||7j"|| tends to infinity. It is possible that fc = tkg in wich case no 7f converges to some 
l^ i {7?, ■ • ■ ,7feo}- Such a t = (ti, . . . G T exists, because if ||/f,e" - /^,eolli/A^n(^") 
tends to as n tends to infinity, || — /f.So II 1 ; and A^„ (0") tends to as n tends to infinity 
(if it was not the case, using the regularity of 6* !—> /^ e wc would have a contradiction). Now 
along the subsequence we may write, for large enough n: 



i^{i.....,tkgV ie{i,...,koV 



{Xi..t - J) - (Xi, 

.JeJ(ii)y.---y.J(ie) 



(Xl:^ = J)7./-7/ll+^ E 



i(Xl:, = J)||7J-7°| 



+11 E 

JeJ{ii)x---xJ{i,) JeJ{ii)x---x.J(ie) 

We shall use Taylor expansion till order 2. To be perfectly rigourous in the following, we 
need write / in terms of its distinct indices, (ii, • • • , and Gjj = nl=i Oji =it 9ii (yj)' 
however we shall not make such a distinction, so that unless otherwise stated, in such a case 
(7j - l^iY^D^G^o can denote 



BC 

t=i 



9lu ' 



19 



and similarly for the second derivatives. We have 

/^{i,...,tfcj' ie{i,...,koV 



JeJ{ii)x---xJ{ie) 



+ Pe(^i:. = J)(7,7-7?)^C'G,o + i J2 P«(^i:£ = J)(77-7?)^I>'G,;(7j-7?) 

>/G./(ii)x---xJ(i£) Jg J(ii)x---x J(if) 

with 7j e (77, 7/)- Thus, using the fact that for all y, g^(y) vanishes as ||7|] tends to infinity, 
fi,9" ~ fi,eo/Xn{0"') converges pointwise along a subsequence to a function h of form 

h= Y TT/G^, + Y (aiG^o +bjD^G^o^ 
/^{i,...,tfeo}« ie{i,...M}' 

+ ^ c/ Y Oii,JzJjD'^G^ozi^j 
ie{i,...,ko}' JeJ{ii)x---xj(ii) 

as in condition L{i), with I]/^{i,...,tfco}« '"'-f + S/e{i,...,feo}* d'^-^l H^-^H + '^-f ) ^ 1- 
ll/f.e" — /^.eolli/^nC^") tends to as n tends to infinity, we have \\h\\i = by Fatou's lemma, 
and thus h — 0, contradicting the assumption. 

Let us now prove that (L(l)) implies {L(i)). Let 

k—tkQ ko ko ti — ti-i 

'^i9'r^ + {"■'■dff + bfD^djf) +Y^'i ^i^i^Z^'^9i°z,^j 

2—1 z— 1 i—1 3 — ^ 

be a function as in (fTTjl . If it equals 0, by grouping the terms depending only on yi , we 
can rewrite the equation as 

k ko 

Y '^'iiv'^r-- ,ye)9-fAyi) + Y {"-'^^y^' " ' ^yi)9j°iyi) + b'i^iy2,--- ,yi)D^9i«{yi)) 

ka ti~ti-i ko 

+ J2 J2 J2 ""'i aijzij{ifD^g^^iyi)zij{i) = 

(26) 

where we have written 

zi,j = (zi,j{ii),- ■ ■ ,zi^j{ie)), with I ^ {ii, ■ ■ ■ J=(ji,,j^), € R'^ 

and 

c'l = c/ J|g^o (yt) 

t=2 

Note that if for i — 1, - ■ ■ ,ko and j — 1, - ■ ■ ,ti — ti-i, there exists Wij E M'^ such that 
Y ^'i Y Oii,JZi,j{{)'^D'^gj^{yi)zij{i) = wfjD'^g^^{yi)w,.j 

i2,--- ,if = l (j2:--- ,je)&J(.i2)x---x,J{ie) 

where possibly Wij — 0. Let aij = WwijW^ /{J2l'=i'^^ ll^ijlP) if there exists j such that 
\\w^J^ > and Y.^2,^, C/ EjC/'"' ll^ijlP, then 

k-O ti — ti-i 

Y Y ^'i Y aijzi.j(if D'^g^^{yi)zi.j(i) = c[ Y otUi^ljD'^ 9l^i.yl)w^. 

3 = 1 i2,--- ,ie = l {h,--- ,3i)<^J(i2)x---xJ(ii) j = l 
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and (fT^ implies that 

a- = c- = 0, 6- = z = 1, • • • , fco, TT- = 0, i = t/co + 1, • • • fc 
Simple calculations imply that 

fe i 

K= ^iW9i,M)=^ 4^V(i2,,«£) e {I,--- ,fc}^"^7r,,,,,..,i, =0 

and similarly if i is such that there exists j = I,-- - — / = [1,12, ■■■ ,ie) and 

= (j: J2, ■ ■ ■ G ^(i) X • • • X J{ie) such that c/ > 0, aj > and > 0, then 

Ci,i2, - ,ii ~ ^ *2, ■ ■ ■ I Else, by considering yt for some other t, we obtain that ([26]) 

implies that 

^^ = V/^ {l,...,^feJ^ c/ = V/e {!,..., tfcj^ 

This leads to 

E ^/n57H(yt) = Vz = l,---,fco. 

A simple recursive argument implies that bj ~ Q for all / e {1, . . . , t^j,}^ which in turns 
implies that a/ = for all / G {1, . . . ,tkgY condition (-L(^)) is verified. 

4.5 Proof of Corollary [2] 

Theorem [5] gives that, for some u„ tending to 0, (depending on whether (Fl) or (EFl) 
holds), 

^ : \\fi,e - keoWi > Un\Yi..n] = o,,^ (1) . 
Thus, to verify Assumption (A4) of Theorem [TJ we may replace J^n in the definition of the 
^n,m(e) by 

{ ■ II//, e - //,eolli < • 
Moreover, using (F2), one gets that there exists a constant C > such that 

ko 

WfiM - Aeoh <C E (^1^' = *i ■ ■ ■ (^1^' - n • • • *0 l+^Pe (Xi = j) ||7,-7°l 

1<Z1 , . . <^0 ^—1 

Recall that, following the arguments in Section [331 if ll/i,e ~ fi.BoWi ^ ^n, there also exists 
c > such that, for large enough n, 

||/;,e - /;,e„|li > c ^ |Pe {X,.j = zi • • -zO-Pflo (X^.j - zi • • • |+^Pe (Xi = j) 

l<ii <A:o 2—1 

(27) 

Since and i?6i may be chosen to be lower and upper bounded respectively in the neighbor- 
hood of 0Oi for any sequence e„ tending to we have that there exists ci > 0,C2 > 0, C3 > 
0, C4 > such that, for large enough n, for any integer m, 

kg 

cime„ < J2 l^e (^i^' = *i • • • «i) " ^So = h ■ ■ ■ ii) \ +J2H ^ < 02(111 + l)e„ 

l<^l ,. . </i^o ^—1 

is a subset of An_rn (cn), which itself is a subset of 

fcp 

C3me„ < J2 l^e (^1^' = n • • • i;) " (Xw = h- ■ -ii)] +J2H ~ < C4(™ + !)£« 
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Using the fact that the prior has a positive density, we obtain that there exists M such that 
for any ra and large enough n, tt {An.m (e™)) ^ M (me„)^ with D = kQ{ko — 1) + kod, and 
using usual Euclidian computations that N{^, An^m (cn) (.,.)) < Mm^ for some other 
constant Af. Applying Theorem [1] we obtain that for any sequence M„ tending to infinity, 

p{0) - 1 



■ WfLOo - fieh: 



< ^\y.n 



2Rg + pie) - 1 - 

Arguing as before, for large enough n, pg > ka for some a > and 
p{9) - 1 M 



= l + op,„(l) 



2Rg + p(0) - 1 V" V" 



for some if > 0. Wc thus obtain that for any sequence Mn tending to infinity. 



: WAeo-fiAi < ^\yi:r. 

\ n 



l + op«o(l)- 

The second part of Corollary [5] follows now from equation (P7| . 
4.6 Proof of Corollary [3] 

We first prove that P'^ (-B„|Yi:„) = l + op(l) by applying Theorem [1] II will be proved below 
that one may take D = Zdjl in Assumption (Al). Then, we only need to prove that (A4) 
holds. By Proposition [1] we get that there exists 0(6*0) > and 77 > such that: 

• If II71 - 7°|| < 77 and II72 - 7°|| < ?y, 



l!/2,e-/2,eolli>c(eo) 



1 



J5+ 9 



19(71 - 7°) +p(72 - 7°) II + 1 1I71 - 7°|f +P II72 - 7*^11^ 



If II71 - 7°ll < ^ and II71 - 7"|| + II72 - 7°ll > 2^7, 



V 



p+q p+q 



• If II72 - 7°ll < V and II71 - 7OII + II72 - 7°ll > 2?y, 



||/2,e-/2.eolli >c(0o) 
If II71 - 7"|| > 77 and II72 - 7°|| > i], 



p 



p+q p+q 



71-7' 



72 - 7 



||/2,e-/2,eolli >c(0o). 

Similar upper bounds hold also. Then, if e„ is a sequence tending to 0, for any m, ^n,m(en) 
is a subset of the set of 6''s such that 



{p + q)A{2-{p + q)) 



p + q 

{p + q)A{2-{p + q)) 
p + q 



\\q{ii - 7") +p{'i2 - 7°)|| + q - 7°|f +p II72 - 7°| 



r ^ II 0111 (P + g) A (2- (p + g)) r ^ II 01 

[p + q\hi - 7 ; : 9 + P 72 - 7 

II IN p + q 



(p + q) A [2 -(p + q))} < (TO + l)e„ 



and An.mi^n) contains the set of 9's such that 
'{p + q) A [2 -(p + q)) 



p + q 

(p + g) A (2 - (p + q)) 
p + q 



[p + q 



9(71 - 7°) +P(72 - 7°)|| + q - 7"^ +P II72 - 7°ir 

I 0111 (p + g) A (2 - (p + q)) r II o||. 

71 - 7 ; : g + P 72 - 7 ; 

I IN p + q " " 

(p + (7)A(2-(p + g))}<(77i + l)e„. 
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This leads to 

^ < [{m + l)e„]2" + [(m + l)e„]2'3 + [(m + l)e„]"+'^ 

and 

so that (A4) holds as soon as a > 3(i/4 and (i > 3d/ A. 
We now prove that ([TO)) holds. Define 



(p + g) A (2 - (p + q)) 
p + q 



k(7i - 7°) +Pil2 - 7°)|| + 9 Ihi - 7°||^ +P II72 - 7"||^ 



< Un 



[ P + q 

and 

5^ = {b + '?)A(2-(p + g)) <M„}. 

Then 

Notice that on Am if p + 9 > 1, then p < e„ and q> 1 — £«, or g < and p > 1 — £«, so 
that also 2 — (p + g) > 1 — e„. 

• On A„ n ||g(7i - 7°) +p(72 - 7°)|| ^ g ||7i - 7°||^ < m„, p ||72 - 7°||^ < u„, 
and p < e„ or q < e„. This gives 7r2(A„ n 5^) < 'uj^'''''^^e"~''^^. 

• On An n p < u„ and g ||7i — 7'^|| ^ u„ in case p + q < 1, and p "^Un, ^ — q ^ Un 
and g II71 — 7°|| < Un in case p + g > 1, leading to 'K2{An n -B^) < + 

• For symmetry reasons, 7r2(A„ n SfJ = 7r2(^,i n i?,^J. 



• On An n i?^ , p < ii„ and g < Un, so that 7r2(A„ n B^J < u^". 



Keeping only the leading terms, we see that ((T6| holds. 

We finally prove that holds with Z? = d + c?/2 and c„ tending to infinity, which will 
finish the proof of corollary |31 Let us introduce the set, for small but fixed e: 

C/n= (e= (P,'7,7i,72) : Il7i-7"ll' < Il72-7°ll' < lk(7i " 7°) +p(72 - 7'')ll < 

\<l-\\<^, \P~\\<^ 

so that UnCl Bn, and 7r2([/„ > n^^''/". Thus 

Ju„xX 



Let u s now study iniO, x) — ^n(6'o, x). First, following the proof of Lemma 2 of (jPouc et al 
2004h we find that, for any 6 e Un, for any a;. 



Thus, for any ^ Un and any x, and since £ni0o,x) does not depend on x, 

^„(0,:E)-f„(0o,a;) >4.(0)-^„(0o)- f}^) • (28) 
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Let us now study in{0) — ini^o)- 

n 



fe=l 



P, {Xk = l\Y,..k-i) ^ (n) + Ve {Xk = 2\Y,..k-i) ^ (n) 

57° 



57° 



and we set for fc = 1 



¥e{Xk = l\Y,.,k-i)=Pe {Xi = l) = 
Fe {Xk = 2\Y,..k-i) = Pe {Xi = 2) = 



P + Q 
P 

p + q' 



Denote Pk{0) the random variable Pg {Xf. = l\Yi;k-i), which is a function of Yi-k-i and thus 
independent of Yfc. We have the recursion 



Pk+i {0) 



(1 - p)pk{e)9-,, (Yk) + g(l - Pkmgj, {Yk) 



Pfc(%7i(^fc) + (l-ffcW)572(n) 
Note that, for any p, q in ]0, 1[, for any fc > 1, 

Pk{p,q,l ,7 ) 



(29) 



P + q 



We shall denote by -D^^^^^ (72)'"^ ^^'^ partial derivative operator j times with respect to 
71 and i — j times with respect to 72 {0 < j < i, the order in which derivatives are taken docs 
not matter). Fix 6 = (p, 9,71,72) & Un- When derivatives are taken at point (p, g, 7", 7"), 
they arc written with as superscript. 

Using Taylor expansion till order 4, there exists t £ [0, 1] such that denoting 6t = t9 + {1 — 

<)(P,9,7°,7°): 

in{0) - inieo) = (71 - 7°)^',^° + (72 - 7°)^',^° + Sn{e) + T^{9) + Rn{9,t) (30) 

where Sn{0) denotes the term of order 2, T„(^) denotes the term of order 3, and Rn{0,t) 
the remainder, that is 

Sn{e) = (71 - I'fDl^^yll + 2(71 - 7°)(72 - l')D',,,,A + (72 - 7°)'^^,,).^° , 



Tn{9) = (71 - 7°)'^f,,)3^" + 3(71 - 7°)'(72 - 7°)i?f,,)^7/° 



and 



+ 3(71 - 7°)(72 - irK,M^^n + (72 - 7°)^i^f,.)3^° 
i?„(^,t) = E ( 5 ) (71 - 7°)'(72 - 7°)'-'=^f,,).,(^,)4-.4(et). 

fe=0 ^ ^ 



Easy but tedious computations lead to the following results. 



(7i-7°)i^^,^°+(72-7°)i?^/° 



^ r)l 

.fc=i ^T° 



9(71 - 7°) +P(72 - 7°) 



P + q 



V^j^l 570 



\/n' 



«(7i-7°)+P(72-7°) 



so that 



sup 1(71 - l')DU°n + (72 - i')DUI\ = Op. (1) . 



eeUn 



(31) 
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Also, 



Sn{0) = - 



9(71 - 7 ) +P{l2 - 7 ) 



2 



L"^*^ k=l ^70 

Using (123) one gets that for all integer fc > 2, {Dl^^p^ = and Dl^p\ = 0): 



k-i ^i9io 



570 



and 

which leads to 
and 

Thus, we obtain 



nl T)0 - -n^ rP 
^j2Pk — ^iiPk 



E>\9io 
9io 



(Yi) 



sup \Snm=Op,^ (1). 



(32) 



For the order 3 term, as soon as 9 G Un- 



Tn{e) 



E 

fc=l 



9fo 



{Yk) 



9(71 - 7°) +P{l2 - 7°) 



570 



^(71 -7°)' + ^(72 -7°)' 
p + q p + q 



n pii r)2 _ 

E ^ O^'c) ^ (Yk) 
k=l 570 570 



9(71 - 7°) +P(72 - 7°) 



p + q 



q 



[p + qY 



71-7^ + 



{p + g)' 



(72 



k=l 



k=l 



9to 



9lo ^70 



k=l 



^ 7~)1 

'Z^(.^(7l,72)P* 



fc=l 



570 



25 



so that using assumptions dTSl) 

^sup \T,,{e)\ = Or,^ + Or,^ (1) + O (n-^/^) Z„ 

with 



1 " 



fc=i 



570 



9to 



Now using ([29| one gets that for all integer fc > 1, 



n2 nO 



pq- 



-^7^70 



t1 Q\^l3lo 



/ ^757( 



/ 570 



pq 



Dl2 9jo 



1 n2 - 9 

7 — :-'^(72)2P'c+i - ^ 



1-p-q 

1 



\ 1 

J 570 



(P + 9)^ 570 

^7^570 



iYk)+Df^^ypl, 



pq 



(p + q)^ 570 



(^fe)+^(72)^P°' 



1 ^-^^(71,72^+1 



= 2 



Pg(g - P) I ^7-970 

{p + qf [ 570 



-0:^570 



570 



and using D^^^^aP? = 0, D^^^yp'l ~ 0, D^^^ 72)?'i ^ ^^^^ "^^^^ tedious computations one 
gets that for some finite C > 0, 



570 



{Yi) 



'Dig JO 



570 



570 



Edo 



, 570 



so that we finally obtain 



sup |T„(0)|=Op,Jl). 
Let us finally study the fourth order remainder Rn{0,t). We have 

1 " 

sup \Rn{0,t)\ < - ^Ak,nBk.n, 



(33) 



fc=i 



where, for big enough n, A^^n is a polynomial of degree at most 4 in sup^/g^^^^o ^) || II 
and -Bfc,n is a sum of terms of form 



sup 



nn 

i=li=0 



D 



t7l)^(72)'-^-P'--(^*) 



where the j are non negative integers such that J2t=i X]i=o ''^i.i — 



To prove that 



sup \Rn{e,t)\^Op. (1) 



(34) 



(35) 



holds, it is enough to prove that Egg \ J2k=i ^k,nBk,n\ = 0{n). But for each k, Pk{6) and 
its derivatives depend on Yi, . . . , Y^-i only, so that Ak^n and Bk^n are independent random 
variables, and 



Ak,7iBk ,n 



k=l 



< 



< C max i?o„ 

j=l,2,3,4 ' 



sup 

.7'eSd(7",£) 



\^iY,)f]j:Eo„\Bk. 

9^' J fe=l 
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for some finite C > 0. Now, using (j29l) one gets that for all integer fc > 1 and for any 9, 
Z?^ »fc+i (p) ^ q)\ ■ 5 ■ > 

Notice that for any 6*, any k > 2, Pk{9) G (1 ~ p,q) so that for any 6* G C/„, any k > 2, 
Pk{d) G [| — e, 5 + e]. We obtain easily that for i — 1,2, k>2, 



sup \Dlvk+i < 



2e 



1 -8e 



sup 



^757' 



7'eSd(70,£) 57' 



(Ffe)!! + sup llJ^^Pfci 



Using similar tricks, it is possible to get that there exists a finite constant C > such that 
for any i = 1, 2, 3, 4, any j = 0, . . . , «, any k > 2, 



sup 



D 



(71)^,(72)' 



i I 



,Pk+i{9) <Ce\ sup sup 



1=1 m=0 



D 



I 

(71)^,(72)' 



By recursion, we obtain that there exists a finite C > such that any term of form (134)) has 
expectation uniformly bounded: 



sup 



nnK.)^(72)'-?'fc(^*) 

i=l j=0 



< C max max Ei 



II 7 ^7 N 
1 sup (Fi) 



m=l,2,3,4 '■=1,2,3,4 \ygs^(^o ,) g^, 



which concludes the proof of pSI) . 

Now, using (Hg), dSni), dSIl), (ES), (El) and dSS]), we get 



so that (dni) holds with D = d + d/2 and any C„ tending to infinity. 
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